September 13, 2016

Explore: Visualize

Agenda

  • Grammer of Graphics

  • Using ggplot2

This presentation is based on the ggplot2 tutorial written by Jennifer Bryan.

ggplot2

  • ggplot2 is a data visualization package

  • created by Hadley Wickham, first released in 2005

  • implements Leland Wilkinson’s (1999) Grammar of Graphics scheme

  • a core part of the tidyverse collection of packages

Grammar of Graphics

Elements

  • data: The data that you want to visualise

  • aes: Aesthetic mappings describing how variables in the data are mapped to aesthetic attributes

    • horizontal position (x)
    • vertical position (y)
    • colour
    • size

Grammar of Graphics

Elements

  • geoms: Geometric objects that represent what you actually see on the plot
    • points
    • lines
    • polygons
    • bars

Grammar of Graphics

Elements

  • stats: Statistics transformations
    • binning and counting observations to create a histogram,
    • summarising a 2d relationship with a linear model
    • stats are optional

Grammar of Graphics

Elements

  • scales: relate the data to the aesthetic

  • coord: a coordinate system that describes how data coordinates are mapped to the plane of the graphic.

  • facet: a faceting specification describes how to break up the data into sets.

Grammar of Graphics

Layers

A layer is composed of four parts:

  • data and aesthetic mapping
  • a statistical transformation (stat)
  • a geometric object (geom)
  • a position adjustment

A plot is constructed by adding layers to each other

ggplot2

Load the library

library(ggplot2)

Read data

library(gapminder)
str(gapminder)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Scatterplot

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point()

Data transformation, quick and dirty

ggplot(gapminder, aes(x = log10(gdpPercap), y = lifeExp)) +
    geom_point()

Data transformation

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point() +
    scale_x_log10()

Color by continent

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + 
    geom_point() +
    scale_x_log10()

Transparency

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() 

Curves

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() +
    geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Curves

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() +
    geom_smooth(lwd = 3, se = FALSE)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Curves

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() +
    geom_smooth(lwd = 3, se = FALSE, method = "lm")

Curves

Return to continents:

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() +
    geom_smooth(lwd = 3, se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Facets

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() +
    facet_wrap(~continent)

Facets

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + 
    geom_point(alpha = (1/3), size = 3) + 
    scale_x_log10() +
    facet_wrap(~ continent) +
    geom_smooth(lwd = 2, se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'